Automatic Complex Instruction Identification with Hardware Sharing for Efficient Application Mapping onto Asips
نویسندگان
چکیده
of Thesis presented to COPPE/UFRJ as a partial fulfillment of the requirements for the degree of Doctor of Science (D.Sc.) AUTOMATIC COMPLEX INSTRUCTION IDENTIFICATION WITH HARDWARE SHARING FOR EFFICIENT APPLICATION MAPPING ONTO ASIPS Alexandre Solon Nery December/2014 Advisors: Felipe Maia Galvão França Nadia Nedjah Lech Jóźwiak Henk Corporaal Department: Systems Engineering and Computer Science Custom instruction identification is an essential part in designing efficient Application-Specific Instruction Set Processors (ASIPs). This thesis proposes and discusses a novel efficient instruction set customization method together with an automatic tool that is able to identify promising custom instruction candidates for a set of relevant benchmark applications. The proposed method formulates the common subgraph enumeration problem as a maximum clique-enumeration problem, with a two-fold novel contribution: one on the connectivity aspect; and the other with respect to the graph (re)-associativity detection. The performance results from the proposed tool for a configurable VLIW-ASIP are provided, achieving a speedup of up to 54% for the ray-tracing application. Circuit area and energy consumption results based on TSMC 65nm technology are also presented. Moreover, this thesis analyzes and discusses the problem of hardware sharing in the context of instruction set customization. Although commercially available hardware synthesis tools are capable of exploiting some hardware sharing opportunities, this thesis shows that the result is usually unsatisfactory. Thus, datapath merging techniques are implemented and analyzed, achieving, on average, substantial circuit area and energy consumption savings of 30% for the sets of custom instructions identified in this thesis. Finally, multi-core architectures are proposed, based on commercially available extensible ASIPs, augmented with the identified set of custom instructions and with hardware sharing optimizations. Using up to eight ASIPs in parallel with complex instructions, a ray-tracer parallel algorithm implementation is proposed, achieving up to 12× speedup in comparison to a single ASIP design. The automatically identified custom instructions provided around 36% execution time reduction for the ray-tracing application.
منابع مشابه
Instruction Set Extraction From Programmable
{Due to the demand for more design exibility and design reuse, ASIPs have emerged as a new important design style in the area of DSP systems. In order to obtain eecient hardware/software partition-ings within ASIP-based systems, the designer has to be supported by CAD tools that allow frequent re-mapping of algorithms onto variable programmable target structures. This leads to a new class of de...
متن کاملPower efficient semi-automatic instruction encoding for application specific instruction set processors
A novel design methodology for the implementation of control units for application specific instruction set processors (ASIPS) is described. This methodology uses automatic instruction encoding and semi-automatic generation of the hardware instruction decoder to speed up the ASIP design. Significant power savings due to optimized instruction encoding are achieved. Results for ICORE (ISS-Core), ...
متن کاملInstruction Set De nition and Instruction Selection for ASIPsJohan
Application Speciic Instruction set Processors (ASIPs) are eld or mask programmable processors of which the architecture and instruction set are opti-mised to a speciic application domain. ASIPs ooer a high degree of exibility and are therefore increasingly being used in competitive markets like telecommunications. However, adequate CAD techniques for the design and programming of ASIPs are mis...
متن کاملEfficient code generation for ASIPs with different word sizes
We propose a complete methodology for extending our automatic ASIP (Architecture Specific Instruction set Processor) synthesis framework to a much wider target architecture space. In this new architecture space the width of the integer data word and of any hardware resource data path is user-definable and application specific. This methodology, developed on the basis of a retargetable C compile...
متن کاملAutomatic instruction-set architecture synthesis for VLIW processor cores in the ASAM project
The design of high-performance application-specific multi-core processor systems still is a time consuming task which involves many manual steps and decisions that need to be performed by experienced design engineers. The ASAM project sought to change this by proposing an automatic architecture synthesis and mapping flow aimed at the design of such application specific instruction-set processor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015